adds cron scripts for nightly tests#482
Conversation
xylar
left a comment
There was a problem hiding this comment.
@grnydawn, this looks reasonable to me. It's a lot to review, so all I did for now is a fairly quick skim. But I'm happy to be involved in updating and maintaining this infrastructure. I'm sure I'll get to know it better in that way.
|
One more thing. Polaris has its own linting tools. See https://docs.e3sm.org/polaris/main/developers_guide/quick_start.html#code-style-for-polaris. I'd prefer that we lint the files added here. This would involve creating a polaris development environment and making a small edit to each file (e.g. adding white space) and then letting the linter do its thing. If need be, we could leave these files out of the lint checking, but I would prefer not to. Would you like me to take care of the linting? |
|
@grnydawn, here is a commit you could cherry-pick if you want to fix most (maybe all?) of the linting issues: |
cron-scripts/README.md
Outdated
|
|
||
| ## Overview | ||
|
|
||
| This repository orchestrates the compilation, testing, and result submission to [CDash](https://my.cdash.org) for two major OMEGA ocean model components: |
There was a problem hiding this comment.
| This repository orchestrates the compilation, testing, and result submission to [CDash](https://my.cdash.org) for two major OMEGA ocean model components: | |
| This repository orchestrates the compilation, testing, and result submission to [CDash](https://my.cdash.org) for two types of OMEGA tests: |
cron-scripts/README.md
Outdated
|
|
||
| This repository orchestrates the compilation, testing, and result submission to [CDash](https://my.cdash.org) for two major OMEGA ocean model components: | ||
|
|
||
| - **Omega** - Next-generation ocean model |
There was a problem hiding this comment.
| - **Omega** - Next-generation ocean model | |
| - **Omega CTests** |
cron-scripts/README.md
Outdated
| This repository orchestrates the compilation, testing, and result submission to [CDash](https://my.cdash.org) for two major OMEGA ocean model components: | ||
|
|
||
| - **Omega** - Next-generation ocean model | ||
| - **Polaris** - MPAS-Ocean model with Omega integration |
There was a problem hiding this comment.
| - **Polaris** - MPAS-Ocean model with Omega integration | |
| - **Polaris** - Omega tests on MPAS meshes``` |
|
|
||
| source /etc/bashrc | ||
|
|
||
| export CRONJOB_BASEDIR=/lcrc/globalscratch/ac.kimy/cronjobs |
There was a problem hiding this comment.
| export CRONJOB_BASEDIR=/lcrc/globalscratch/ac.kimy/cronjobs | |
| export CRONJOB_BASEDIR=/lcrc/globalscratch/${USER}/cronjobs |
| export https_proxy=http://proxy.ccs.ornl.gov:3128/ | ||
| export no_proxy='localhost,127.0.0.0/8,*.ccs.ornl.gov' | ||
|
|
||
| export CRONJOB_BASEDIR=/lustre/orion/cli115/scratch/grnydawn/cronjobs |
There was a problem hiding this comment.
| export CRONJOB_BASEDIR=/lustre/orion/cli115/scratch/grnydawn/cronjobs | |
| export CRONJOB_BASEDIR=/lustre/orion/cli115/scratch/${USER}/cronjobs |
|
|
||
| module load cray-python cmake | ||
|
|
||
| export CRONJOB_BASEDIR=/pscratch/sd/y/youngsun/omega/cronjobs_pm-cpu |
There was a problem hiding this comment.
| export CRONJOB_BASEDIR=/pscratch/sd/y/youngsun/omega/cronjobs_pm-cpu | |
| export CRONJOB_BASEDIR=/pscratch/sd/${USER:0:1}/${USER}/omega/cronjobs_pm-cpu |
|
|
||
| module load cray-python cmake | ||
|
|
||
| export CRONJOB_BASEDIR=/pscratch/sd/y/youngsun/omega/cronjobs_pm-gpu |
There was a problem hiding this comment.
| export CRONJOB_BASEDIR=/pscratch/sd/y/youngsun/omega/cronjobs_pm-gpu | |
| export CRONJOB_BASEDIR=/pscratch/sd/${USER:0:1}/${USER}/omega/cronjobs_pm-gpu |
| #!/bin/bash -l | ||
| #SBATCH --nodes=1 | ||
| #SBATCH -q debug | ||
| #SBATCH --account=cli115 |
There was a problem hiding this comment.
Should we get account from a place that's common with that used to generate polaris job scripts, currentlypolaris/machines/*.cfg option parallel/account?
There was a problem hiding this comment.
@cbegeman , I think it would be better to get the account from a common place. Do any of the polaris/machines/*.cfg files have a parallel/account entry? I couldn’t find any account information in those files. While I do see a group entry in the cfg files, some values of group do not match the actual account name—for example, e3sm_g on PM-GPU.
There was a problem hiding this comment.
It looks like they don't but they could/should, e.g.,
polaris/polaris/job/__init__.py
Line 76 in 1981a58
| if [[ "$CRONJOB_MACHINE" == "chrysalis" ]]; then | ||
| module load python cmake | ||
| PARMETIS_TPL="/lcrc/soft/climate/polaris/chrysalis/spack/dev_polaris_0_10_0_COMPILER_openmpi/var/spack/environments/dev_polaris_0_10_0_COMPILER_openmpi/.spack-env/view" | ||
|
|
||
| elif [[ "$CRONJOB_MACHINE" == "frontier" ]]; then | ||
| module load cray-python cmake git-lfs | ||
| PARMETIS_TPL="/ccs/proj/cli115/software/polaris/frontier/spack/dev_polaris_0_10_0_COMPILER_mpich/var/spack/environments/dev_polaris_0_10_0_COMPILER_mpich/.spack-env/view" | ||
|
|
||
| elif [[ "$CRONJOB_MACHINE" == "unknown" ]]; then | ||
| echo "CRONJOB_MACHINE is not set." | ||
| exit -1 | ||
|
|
||
| else | ||
| echo "It seems that the cron job is not configured with CRONJOB_MACHINE." | ||
| exit -1 | ||
|
|
||
| fi |
There was a problem hiding this comment.
Since we already have some logic here for different machines, it would be good to pull out the pieces that are common to each job*.sbatch file to a single file for ease of maintenance
| POLARIS_CDASH_BASEDIR=${CRONJOB_BASEDIR}/tasks/polaris_cdash | ||
| POLARIS_CDASH_TESTDIR="${POLARIS_CDASH_BASEDIR}/tests" | ||
| OMEGA_HOME="${POLARIS_CDASH_BASEDIR}/polaris/e3sm_submodules/Omega" | ||
| MINIFORGE3_HOME="${POLARIS_CDASH_BASEDIR}/miniforge3" |
There was a problem hiding this comment.
Can we make this a command-line argument to launch_all.sh so we can use an existing miniforge install?
There was a problem hiding this comment.
Is it preferable to test the latest Polaris with a nightly Cronjob? Originally, I thought of using the Polaris code base that contains the cron-scripts sub-directory, but I realized that the repo might not be up to date. I probably still need to clone or update it to ensure I am using the latest version.
There was a problem hiding this comment.
Sorry, I'm not understanding your question. When we merge this PR, the polaris code base would include the cron-scripts directory. So maybe the answer is yes, clone/update polaris nightly to use the latest main
There was a problem hiding this comment.
Sorry for the confusion. To run the nightly tests, crontab should execute the launch_all.sh script every day. Since all files under the cron-scripts folder, including launch_all.sh, are part of the Polaris repository, we may need to update the Polaris repository to the latest version before running launch_all.sh.
There was a problem hiding this comment.
Ah, I see. I guess that's the downside of having cron-scripts in polaris.
|
In order to use polaris/configure_polaris_envs.py Line 150 in c59cd32 because it's trying to run the command from wherever I executed |
|
I get the following error when I attempt to |
65a55ab to
c12641d
Compare
|
@cbegeman Thanks for the review. All the suggestions make sense to me. Since the error you noted above appears to be related to Phil’s recent PR (E3SM-Project/Omega#362), I’ll review the Phil's PR and let it be merged into Omega first, then update the branch for this PR. |
|
@cbegeman , I have run the |
It sounds like you have both local changes and those on the remote branch. In such circumstances, you should fetch the remote branch: and then rebase your local branch onto the remote one, e.g.: You may then end up with merge conflicts you need to resolve in the usual way. |
This PR adds cron scripts to Polaris for running nightly Omega and Polaris tests, and initiates discussion regarding this feature.